Stamp collection, big data and reproducible science

Sungsam Gong Clinical School University of Cambridge

18 Oct 2019

Stamp collecting

  • Ernest Rutherford
  • Ernest Rutherford
  • 30 Aug 1871 - 19 Oct 1937
  • Father of nuclear physics
  • Discovery of proton
  • 3rd Directory of Cavendish Laboratory
  • Nobel prize in chemistry (1908)

Lord Enerst Rutherford

  • All science is either physics or stamp collecting
  • Other variants
    • That which is not measurable is not science (by William Thomson)
    • That which is not physics is stamp collecting
    • Physics is the only real science. The rest are just stamp collecting

The Cavendish Lab

Conclusion (summary)

  • ‘Stamping collecting’ does matter in science
    • But a matter of the cost, the size, and the speed
  • Big data and paradigm shift in science
    • Hypothesis testing to hypothesis generating
    • Recursive and repetitive

Stamp collectors

The ‘great’ stamp collectors

  • Charles Darwin
    • 5 yrs voyage from 1831
  • Gregor Mendel
    • 29,000 garden peas between 1856 and 1863
    • Laws of inheritance
  • Fred Sanger
    • Insulin sequence in 1950s
  • Max Perutz and John Kendrew
    • 3D structure of hemoglobin in 1959
  • Dorothy Hodgkin
    • Penicillin & vitamin B12
    • 3D structure of insulin in 1960s
  • Tom Blundell
    • EGFR, HIV protease, crystallins
    • ~200 3D structures of proteins

Insulin: Structure, Function and Evolution

- Blundell TL, Cutfield JF, Cutfield SM, Dodson GG, Dodson E, Mercola D, Vijayan M and Hodgkin DC (1971) Atomic positions in 2-Zinc insulin crystals
Nature 231, 506-511

Insulin-like Superfamily: Structures & Functions

ESST: Environment Specific Substitution Table

Local Structural Environments

Other stamps

  • Sydney Brenner (1927 – 2019) Nobel Prize 2002
  • Progress in science depends on new techniques, new discoveries, and new ideas, probably in that order

The growth of genomic data

Even bigger - multi omics

AI & ML

  • Where there’s data, there’s AI
  • But, data quality (QC) over quantity
  • Quality adjustment by AI?

Reproducible Science

Science (from the Latin scientia, meaning “knowledge”) refers to any systematic knowledge-base or prescriptive practice that is capable of resulting in a prediction or predictable type of outcome

  • Reasoning

Science & Reasoning

  • Inductive reasoning
    • Induction (귀납법):reasoning from a specific case or cases and deriving a general rule
  • Pattern recognition (recursive)
    • Fractal
  • A process of making general rules behind observations
    • Further to prediction
  • 사주, 팔자, 관상

Repeatability vs. reproducibility

Repeatability vs. reproducibility

Confounding effect

example

CI/CD in research